This blog contains a regional analysis of data collected from a weather station in McCall, Idaho. This data goes back to March, 1906. The data includes daily minimum and maximum temperatures at the station location on the given date.
## 'data.frame': 38990 obs. of 24 variables:
## $ STATION: Factor w/ 1 level "USC00105708": 1 1 1 1 1 1 1 1 1 1 ...
## $ NAME : Factor w/ 1 level "MCCALL, ID US": 1 1 1 1 1 1 1 1 1 1 ...
## $ DATE : Factor w/ 38990 levels "1906-03-01","1906-03-02",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ DAPR : int NA NA NA NA NA NA NA NA NA NA ...
## $ DASF : int NA NA NA NA NA NA NA NA NA NA ...
## $ MDPR : num NA NA NA NA NA NA NA NA NA NA ...
## $ MDSF : num NA NA NA NA NA NA NA NA NA NA ...
## $ PRCP : num 0 0 0 3.8 0 0 0 0 0 0 ...
## $ SNOW : num 0 0 0 51 0 0 0 0 0 0 ...
## $ SNWD : num 1092 1092 1067 1118 1092 ...
## $ TMAX : num 2.2 1.1 2.2 2.2 3.9 8.3 8.9 8.3 9.4 6.7 ...
## $ TMIN : num -11.1 -19.4 -10.6 -2.2 -6.7 -9.4 -10.6 -9.4 -7.8 -8.3 ...
## $ TOBS : num -3.9 -2.2 -4.4 1.1 0 3.3 1.7 3.3 6.1 6.1 ...
## $ WT01 : int NA NA NA NA NA NA NA NA NA NA ...
## $ WT03 : int NA NA NA NA NA NA NA NA NA NA ...
## $ WT04 : int NA NA NA NA NA NA NA NA NA NA ...
## $ WT05 : int NA NA NA NA NA NA NA NA NA NA ...
## $ WT06 : int NA NA NA NA NA NA NA NA NA NA ...
## $ WT08 : int NA NA NA NA NA NA NA NA NA NA ...
## $ WT09 : int NA NA NA NA NA NA NA NA NA NA ...
## $ WT11 : int NA NA NA NA NA NA NA NA NA NA ...
## $ WT14 : int NA NA NA NA NA NA NA NA NA NA ...
## $ WT16 : int NA NA NA NA NA NA NA NA NA NA ...
## $ WT18 : int NA NA NA NA NA NA NA NA NA NA ...
## [1] NA
This is the data plotted in graphs. These graphs just show the changes in daily maximum temperature which varies alot from the hot summers to the cold winters in Mccall. I changed the data from daily highs and lows to monthy means.
##
## Call:
## lm(formula = TMAX ~ NewDate, data = climate_data)
##
## Coefficients:
## (Intercept) NewDate
## 1.243e+01 2.108e-05
##
## Call:
## lm(formula = TMAX ~ NewDate, data = climate_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -39.287 -9.448 -1.020 9.864 27.894
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.243e+01 5.698e-02 218.074 < 2e-16 ***
## NewDate 2.108e-05 4.920e-06 4.285 1.83e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.07 on 38037 degrees of freedom
## (951 observations deleted due to missingness)
## Multiple R-squared: 0.0004824, Adjusted R-squared: 0.0004562
## F-statistic: 18.36 on 1 and 38037 DF, p-value: 1.834e-05
## 'data.frame': 1297 obs. of 5 variables:
## $ Month: chr "03" "04" "05" "06" ...
## $ Year : chr "1906" "1906" "1906" "1906" ...
## $ TMAX : num 4.59 12.25 15.22 16.82 28.47 ...
## $ YEAR : num 1906 1906 1906 1906 1906 ...
## $ MONTH: num 3 4 5 6 7 8 9 10 11 12 ...
##
## Call:
## lm(formula = TMAX ~ YEAR, data = MonthlyTMAXMean[MonthlyTMAXMean$Month ==
## "05", ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.1256 -1.8211 -0.2484 1.6681 7.0097
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.304444 14.555709 0.502 0.617
## YEAR 0.004603 0.007405 0.622 0.536
##
## Residual standard error: 2.458 on 105 degrees of freedom
## Multiple R-squared: 0.003667, Adjusted R-squared: -0.005822
## F-statistic: 0.3864 on 1 and 105 DF, p-value: 0.5355
The P-value for the month of may is .53 which means that the results of the analysis are not statistically significant.
## Month Year TMIN YEAR MONTH
## 1 03 1906 -9.4741935 1906 3
## 2 04 1906 -4.2800000 1906 4
## 3 05 1906 -0.8806452 1906 5
## 4 06 1906 1.9033333 1906 6
## 5 07 1906 6.9064516 1906 7
## 6 08 1906 4.3548387 1906 8
## % latex table generated in R 3.6.0 by xtable 1.8-4 package
## % Sun Sep 13 15:22:51 2020
## \begin{table}[ht]
## \centering
## \begin{tabular}{rlllll}
## \hline
## & Month & Slope TMIN & R\verb|^|2 & Slope TMAX & R\verb|^|2.1 \\
## \hline
## 1 & January & 0.0429 *** & 0.146 & 0.021 ** & 0.095 \\
## 2 & February & 0.0346 *** & 0.121 & 0.0128 NS & 0.029 \\
## 3 & March & 0.038 *** & 0.231 & 0.0169 * & 0.048 \\
## 4 & April & 0.022 *** & 0.17 & 0.0075 NS & 0.008 \\
## 5 & May & 0.018 *** & 0.18 & 0.0046 NS & 0.004 \\
## 6 & June & 0.0119 ** & 0.074 & 0.0112 NS & 0.025 \\
## 7 & July & 0.0036 NS & 0.004 & 0.0019 NS & 0.001 \\
## 8 & August & 0.0179 *** & 0.098 & 0.0173 ** & 0.069 \\
## 9 & September & 0.0116 * & 0.042 & 0.0141 NS & 0.035 \\
## 10 & October & -7e-04 NS & 0 & 0.0057 NS & 0.005 \\
## 11 & November & 0.0103 NS & 0.024 & -0.0031 NS & 0.002 \\
## 12 & December & 0.008 NS & 0.008 & -0.0015 NS & 0.001 \\
## \hline
## \end{tabular}
## \end{table}
This graph shows the monthly minimum temperatures in McCall.
Error: Incomplete expression: Results <- data.frame(Month = TMINresult[c(2:13),1], TMINSlope = TMINresult[c(2:13),2], TMIN_P = as.numeric(TMINresult[c(2:13),3]), TMINRsq = TMINresult[c(2:13),4],
#Error in TMAXresult[c(2:13), 2] : incorrect number of dimensions
##Precipitation: Departure from Mean
climate_data$PRCP[climate_data$PRCP==-9999] <- NA
Missing <- aggregate(is.na(climate_data$PRCP),
list(climate_data$Month, climate_data$Year), sum)
# The aggregate command is used to create a simplified dataset. In this case
# we are creating a sum of PRCP based on each month and year.
Missing$Date = as.numeric(Missing$Group.1) + as.numeric(Missing$Group.2)/12
plot(x ~ Date, data=Missing)
This graph analyzes how much precipitation deviated from the mean.
#aggreate by month and year to get monthly totals
#cut out the months that have more than 4 missing days.
TotalPPT <- aggregate(climate_data$PRCP,
list(climate_data$Month, climate_data$Year), sum, na.rm=T)
names(TotalPPT) = c("Group.1", "Group.2", "ppt")
NonMissing <- Missing[Missing$x < 5, c(1:3)]
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
PPT <- merge(TotalPPT, NonMissing, all.y=TRUE)
PPT$Date <- as.numeric(PPT$Group.1) + as.numeric(PPT$Group.2)/12
head(PPT)
## Group.1 Group.2 ppt x Date
## 1 01 1907 134.0 0 159.9167
## 2 01 1908 56.7 0 160.0000
## 3 01 1909 224.0 0 160.0833
## 4 01 1910 81.4 0 160.1667
## 5 01 1917 59.9 0 160.7500
## 6 01 1918 110.6 0 160.8333
#Finding the mean
PRCP_mean = mean(PPT$ppt)
plot(ppt~Date, data=PPT)
abline(h=PRCP_mean, col="blue")
The mean kinda looks meaningless because the data is so scattered.
#Looking at a few months code will not run so i put it all in ##
#STATION£PRCP[STATION~PRCP==-9999] <- NA
#YearlySum = aggregate(PRCP ~ Year, NAME, sum)
#YearlySum£YEAR = as.numeric(YearlySum£Year)
#YearlyMean = mean(YearlySum£PRCP)
#plot(PRCP~YEAR, data=YearlySum, las=1, ty="p")
#abline(h=YearlyMean, col="blue")
#YearlySum.lm = lm(PRCP~YEAR, data=YearlySum)
#abline(coef(YearlySum.lm), col="green")
#n <- 5
#k <- rep(1/n, n)
#k
#y_lag <- stats::filter(YearlySum£PRCP, k, sides=1)
#lines(YearlySum~YEAR, y_lag, col="red")
#summary(YearlySum.lm)
This code did not work for me. I received: Error: unexpected input in “STATION�” Error in eval(predvars, data, env) : object ‘YEAR’ not found
par(mfrow=c(2,2))
plot(lm(TMIN ~ YEAR, data=MonthlyTMINMean[MonthlyTMINMean$MONTH==1,]))
These graphs check if the data is following the assumtions for our statistical test. The data looks pretty normal according to the qq graph.